Selecting Systemic Features for Text Classification

نویسندگان

  • Casey Whitelaw
  • Jon Patrick
چکیده

Systemic features use linguisticallyderived language models as a basis for text classification. The graph structure of these models allows for feature representations not available with traditional bag-of-words approaches. This paper explores the set of possible representations, and proposes feature selection methods that aim to produce the most compact and effective set of attributes for a given classification problem. We show that small sets of systemic features can outperform larger sets of wordbased features in the task of identifying financial scam documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News

This work focuses on selecting features in the automatic text categorization of Chinese industrial and financial news. We use feature selecting method for the characteristics of subclass Chinese financial and industrial news. However, it is an open challenge for subclass news in solving real-world problems which are often high-dimensional. Therefore, we proposed a feature selecting model in aut...

متن کامل

Determining the effective features in classification of heart sounds using trained intelligent network and genetic algorithm

Heart diseases are among the most important causes of mortality in the world, especially in industrial countries. Using heart sounds and the features extracted from them are among the non-aggressive diagnosis and prognosis methods for heart diseases. In this study, the time-scale, Cepstral, frequency, temporal and turbulence features are saved and extracted from the heart sounds, and then they ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004